A Comparison of Data Mining Tools using the implementation of C4.5 Algorithm

نویسنده

  • Divya Jain
چکیده

This paper presents the implementation on a healthcare dataset using data mining tools to find important parameters that reflect the effect of diabetes on kidney of patients. This is done with the use of Kidney Function Tests (KFT). The data mining tools used are Tanagra and Weka with the application of C4.5 Algorithm which is based on decision trees. This paper compares the result given by Weka and Tanagra. The outcome of both the tools is analyzed and conclusion is drawn that both the tools are able to work well on dataset but Tanagra is more efficient and less error-prone in terms of the performance of the classifier. The effective usage of data mining tools enables us to find important parameters that reflect the effect of diabetes on kidney. Additionally, it is found that the performance of Weka is best when used with “Use Training Set” mode than with cross validation followed by percentage split mode for training the classifier.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of liquefaction potential based on CPT results using C4.5 decision tree

The prediction of liquefaction potential of soil due to an earthquake is an essential task in Civil Engineering. The decision tree is a tree structure consisting of internal and terminal nodes which process the data to ultimately yield a classification. C4.5 is a known algorithm widely used to design decision trees. In this algorithm, a pruning process is carried out to solve the problem of the...

متن کامل

A Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors

Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...

متن کامل

Credit Card Fraud Detection using Data mining and Statistical Methods

Due to today’s advancement in technology and businesses, fraud detection has become a critical component of financial transactions. Considering vast amounts of data in large datasets, it becomes more difficult to detect fraud transactions manually. In this research, we propose a combined method using both data mining and statistical tasks, utilizing feature selection, resampling and cost-...

متن کامل

Comparison of Classifier Algorithms in the Identification of Polypharmacy and Factors Affecting it in the Elderly Patients

Introduction: Prescribing and consuming drugs more than necessary which is known as polypharmacy, is both waste of resources and harm to patients. Polypharmacy is especially important for elderly patients; therefore, the factors affecting it must be identified and analyzed properly. Method: In this retrospective study, first, several classifier algorithms, i.e., C4.5, SVM, KNN, MLP, and BN for ...

متن کامل

Prediction and Diagnosis of Diabetes Mellitus using a Water Wave Optimization Algorithm

Data mining is an appropriate way to discover information and hidden patterns in large amounts of data, where the hidden patterns cannot be easily discovered in normal ways. One of the most interesting applications of data mining is the discovery of diseases and disease patterns through investigating patients' records. Early diagnosis of diabetes can reduce the effects of this devastating disea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014